624 research outputs found

    Feasible Policy Iteration

    Full text link
    Safe reinforcement learning (RL) aims to solve an optimal control problem under safety constraints. Existing direct\textit{direct} safe RL methods use the original constraint throughout the learning process. They either lack theoretical guarantees of the policy during iteration or suffer from infeasibility problems. To address this issue, we propose an indirect\textit{indirect} safe RL method called feasible policy iteration (FPI) that iteratively uses the feasible region of the last policy to constrain the current policy. The feasible region is represented by a feasibility function called constraint decay function (CDF). The core of FPI is a region-wise policy update rule called feasible policy improvement, which maximizes the return under the constraint of the CDF inside the feasible region and minimizes the CDF outside the feasible region. This update rule is always feasible and ensures that the feasible region monotonically expands and the state-value function monotonically increases inside the feasible region. Using the feasible Bellman equation, we prove that FPI converges to the maximum feasible region and the optimal state-value function. Experiments on classic control tasks and Safety Gym show that our algorithms achieve lower constraint violations and comparable or higher performance than the baselines

    Willingness to pay for climate change mitigation:evidence from China

    Get PDF
    China has become the largest emitter of carbon dioxide in the world. However, the Chinese public's willingness to pay (WTP) for climate change mitigation is, at best, under-researched. This study draws upon a large national survey of Chinese public cognition and attitude towards climate change and analyzes the determinants of consumers' WTP for energy-efficient and environment-friendly products. Eighty-five percent of respondents indicate that they are willing to pay at least 10 percent more than the market price for these products. The econometric analysis indicates that income, education, age and gender, as well as public awareness and concerns about climate change are significant factors influencing WTP. Respondents who are more knowledgeable and more concerned about the adverse effect of climate change show higher WTP. In comparison, income elasticity is small. The results are robust to different model specifications and estimation techniques. © 2016 by the IAEE. All rights reserved
    • …
    corecore